home *** CD-ROM | disk | FTP | other *** search
- Subject: v23i044: Flex, a fast lex replacement, Part08/10
- Newsgroups: comp.sources.unix
- Approved: rsalz@uunet.UU.NET
- X-Checksum-Snefru: ca66ccd4 e23a0f71 33039016 677e8981
-
- Submitted-by: Vern Paxson <vern@cs.cornell.edu>
- Posting-number: Volume 23, Issue 44
- Archive-name: flex2.3/part08
-
- #! /bin/sh
- # This is a shell archive. Remove anything before this line, then feed it
- # into a shell via "sh file" or similar. To overwrite existing files,
- # type "sh file -c".
- # The tool that generated this appeared in the comp.sources.unix newsgroup;
- # send mail to comp-sources-unix@uunet.uu.net if you want that tool.
- # Contents: flexdoc.1.02 misc.c nfa.c
- # Wrapped by rsalz@litchi.bbn.com on Wed Oct 10 13:24:04 1990
- PATH=/bin:/usr/bin:/usr/ucb ; export PATH
- echo If this archive is complete, you will see the following message:
- echo ' "shar: End of archive 8 (of 10)."'
- if test -f 'flexdoc.1.02' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'flexdoc.1.02'\"
- else
- echo shar: Extracting \"'flexdoc.1.02'\" \(15514 characters\)
- sed "s/^X//" >'flexdoc.1.02' <<'END_OF_FILE'
- XAnother area where the user can increase a scanner's performance
- X(and one that's easier to implement) arises from the fact that
- Xthe longer the tokens matched, the faster the scanner will run.
- XThis is because with long tokens the processing of most input
- Xcharacters takes place in the (short) inner scanning loop, and
- Xdoes not often have to go through the additional work of setting up
- Xthe scanning environment (e.g.,
- X.B yytext)
- Xfor the action. Recall the scanner for C comments:
- X.nf
- X
- X %x comment
- X %%
- X int line_num = 1;
- X
- X "/*" BEGIN(comment);
- X
- X <comment>[^*\\n]*
- X <comment>"*"+[^*/\\n]*
- X <comment>\\n ++line_num;
- X <comment>"*"+"/" BEGIN(INITIAL);
- X
- X.fi
- XThis could be sped up by writing it as:
- X.nf
- X
- X %x comment
- X %%
- X int line_num = 1;
- X
- X "/*" BEGIN(comment);
- X
- X <comment>[^*\\n]*
- X <comment>[^*\\n]*\\n ++line_num;
- X <comment>"*"+[^*/\\n]*
- X <comment>"*"+[^*/\\n]*\\n ++line_num;
- X <comment>"*"+"/" BEGIN(INITIAL);
- X
- X.fi
- XNow instead of each newline requiring the processing of another
- Xaction, recognizing the newlines is "distributed" over the other rules
- Xto keep the matched text as long as possible. Note that
- X.I adding
- Xrules does
- X.I not
- Xslow down the scanner! The speed of the scanner is independent
- Xof the number of rules or (modulo the considerations given at the
- Xbeginning of this section) how complicated the rules are with
- Xregard to operators such as '*' and '|'.
- X.LP
- XA final example in speeding up a scanner: suppose you want to scan
- Xthrough a file containing identifiers and keywords, one per line
- Xand with no other extraneous characters, and recognize all the
- Xkeywords. A natural first approach is:
- X.nf
- X
- X %%
- X asm |
- X auto |
- X break |
- X ... etc ...
- X volatile |
- X while /* it's a keyword */
- X
- X .|\\n /* it's not a keyword */
- X
- X.fi
- XTo eliminate the back-tracking, introduce a catch-all rule:
- X.nf
- X
- X %%
- X asm |
- X auto |
- X break |
- X ... etc ...
- X volatile |
- X while /* it's a keyword */
- X
- X [a-z]+ |
- X .|\\n /* it's not a keyword */
- X
- X.fi
- XNow, if it's guaranteed that there's exactly one word per line,
- Xthen we can reduce the total number of matches by a half by
- Xmerging in the recognition of newlines with that of the other
- Xtokens:
- X.nf
- X
- X %%
- X asm\\n |
- X auto\\n |
- X break\\n |
- X ... etc ...
- X volatile\\n |
- X while\\n /* it's a keyword */
- X
- X [a-z]+\\n |
- X .|\\n /* it's not a keyword */
- X
- X.fi
- XOne has to be careful here, as we have now reintroduced backtracking
- Xinto the scanner. In particular, while
- X.I we
- Xknow that there will never be any characters in the input stream
- Xother than letters or newlines,
- X.I flex
- Xcan't figure this out, and it will plan for possibly needing backtracking
- Xwhen it has scanned a token like "auto" and then the next character
- Xis something other than a newline or a letter. Previously it would
- Xthen just match the "auto" rule and be done, but now it has no "auto"
- Xrule, only a "auto\\n" rule. To eliminate the possibility of backtracking,
- Xwe could either duplicate all rules but without final newlines, or,
- Xsince we never expect to encounter such an input and therefore don't
- Xhow it's classified, we can introduce one more catch-all rule, this
- Xone which doesn't include a newline:
- X.nf
- X
- X %%
- X asm\\n |
- X auto\\n |
- X break\\n |
- X ... etc ...
- X volatile\\n |
- X while\\n /* it's a keyword */
- X
- X [a-z]+\\n |
- X [a-z]+ |
- X .|\\n /* it's not a keyword */
- X
- X.fi
- XCompiled with
- X.B -Cf,
- Xthis is about as fast as one can get a
- X.I flex
- Xscanner to go for this particular problem.
- X.LP
- XA final note:
- X.I flex
- Xis slow when matching NUL's, particularly when a token contains
- Xmultiple NUL's.
- XIt's best to write rules which match
- X.I short
- Xamounts of text if it's anticipated that the text will often include NUL's.
- X.SH INCOMPATIBILITIES WITH LEX AND POSIX
- X.I flex
- Xis a rewrite of the Unix
- X.I lex
- Xtool (the two implementations do not share any code, though),
- Xwith some extensions and incompatibilities, both of which
- Xare of concern to those who wish to write scanners acceptable
- Xto either implementation. At present, the POSIX
- X.I lex
- Xdraft is
- Xvery close to the original
- X.I lex
- Ximplementation, so some of these
- Xincompatibilities are also in conflict with the POSIX draft. But
- Xthe intent is that except as noted below,
- X.I flex
- Xas it presently stands will
- Xultimately be POSIX conformant (i.e., that those areas of conflict with
- Xthe POSIX draft will be resolved in
- X.I flex's
- Xfavor). Please bear in
- Xmind that all the comments which follow are with regard to the POSIX
- X.I draft
- Xstandard of Summer 1989, and not the final document (or subsequent
- Xdrafts); they are included so
- X.I flex
- Xusers can be aware of the standardization issues and those areas where
- X.I flex
- Xmay in the near future undergo changes incompatible with
- Xits current definition.
- X.LP
- X.I flex
- Xis fully compatible with
- X.I lex
- Xwith the following exceptions:
- X.IP -
- XThe undocumented
- X.I lex
- Xscanner internal variable
- X.B yylineno
- Xis not supported. It is difficult to support this option efficiently,
- Xsince it requires examining every character scanned and reexamining
- Xthe characters when the scanner backs up.
- XThings get more complicated when the end of buffer or file is reached or a
- XNUL is scanned (since the scan must then be restarted with the proper line
- Xnumber count), or the user uses the yyless(), unput(), or REJECT actions,
- Xor the multiple input buffer functions.
- X.IP
- XThe fix is to add rules which, upon seeing a newline, increment
- Xyylineno. This is usually an easy process, though it can be a drag if some
- Xof the patterns can match multiple newlines along with other characters.
- X.IP
- Xyylineno is not part of the POSIX draft.
- X.IP -
- XThe
- X.B input()
- Xroutine is not redefinable, though it may be called to read characters
- Xfollowing whatever has been matched by a rule. If
- X.B input()
- Xencounters an end-of-file the normal
- X.B yywrap()
- Xprocessing is done. A ``real'' end-of-file is returned by
- X.B input()
- Xas
- X.I EOF.
- X.IP
- XInput is instead controlled by redefining the
- X.B YY_INPUT
- Xmacro.
- X.IP
- XThe
- X.I flex
- Xrestriction that
- X.B input()
- Xcannot be redefined is in accordance with the POSIX draft, but
- X.B YY_INPUT
- Xhas not yet been accepted into the draft (and probably won't; it looks
- Xlike the draft will simply not specify any way of controlling the
- Xscanner's input other than by making an initial assignment to
- X.I yyin).
- X.IP -
- X.I flex
- Xscanners do not use stdio for input. Because of this, when writing an
- Xinteractive scanner one must explicitly call fflush() on the
- Xstream associated with the terminal after writing out a prompt.
- XWith
- X.I lex
- Xsuch writes are automatically flushed since
- X.I lex
- Xscanners use
- X.B getchar()
- Xfor their input. Also, when writing interactive scanners with
- X.I flex,
- Xthe
- X.B -I
- Xflag must be used.
- X.IP -
- X.I flex
- Xscanners are not as reentrant as
- X.I lex
- Xscanners. In particular, if you have an interactive scanner and
- Xan interrupt handler which long-jumps out of the scanner, and
- Xthe scanner is subsequently called again, you may get the following
- Xmessage:
- X.nf
- X
- X fatal flex scanner internal error--end of buffer missed
- X
- X.fi
- XTo reenter the scanner, first use
- X.nf
- X
- X yyrestart( yyin );
- X
- X.fi
- X.IP -
- X.B output()
- Xis not supported.
- XOutput from the
- X.B ECHO
- Xmacro is done to the file-pointer
- X.I yyout
- X(default
- X.I stdout).
- X.IP
- XThe POSIX draft mentions that an
- X.B output()
- Xroutine exists but currently gives no details as to what it does.
- X.IP -
- X.I lex
- Xdoes not support exclusive start conditions (%x), though they
- Xare in the current POSIX draft.
- X.IP -
- XWhen definitions are expanded,
- X.I flex
- Xencloses them in parentheses.
- XWith lex, the following:
- X.nf
- X
- X NAME [A-Z][A-Z0-9]*
- X %%
- X foo{NAME}? printf( "Found it\\n" );
- X %%
- X
- X.fi
- Xwill not match the string "foo" because when the macro
- Xis expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
- Xand the precedence is such that the '?' is associated with
- X"[A-Z0-9]*". With
- X.I flex,
- Xthe rule will be expanded to
- X"foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
- XNote that because of this, the
- X.B ^, $, <s>, /,
- Xand
- X.B <<EOF>>
- Xoperators cannot be used in a
- X.I flex
- Xdefinition.
- X.IP
- XThe POSIX draft interpretation is the same as
- X.I flex's.
- X.IP -
- XTo specify a character class which matches anything but a left bracket (']'),
- Xin
- X.I lex
- Xone can use "[^]]" but with
- X.I flex
- Xone must use "[^\\]]". The latter works with
- X.I lex,
- Xtoo.
- X.IP -
- XThe
- X.I lex
- X.B %r
- X(generate a Ratfor scanner) option is not supported. It is not part
- Xof the POSIX draft.
- X.IP -
- XIf you are providing your own yywrap() routine, you must include a
- X"#undef yywrap" in the definitions section (section 1). Note that
- Xthe "#undef" will have to be enclosed in %{}'s.
- X.IP
- XThe POSIX draft
- Xspecifies that yywrap() is a function and this is very unlikely to change; so
- X.I flex users are warned
- Xthat
- X.B yywrap()
- Xis likely to be changed to a function in the near future.
- X.IP -
- XAfter a call to
- X.B unput(),
- X.I yytext
- Xand
- X.I yyleng
- Xare undefined until the next token is matched. This is not the case with
- X.I lex
- Xor the present POSIX draft.
- X.IP -
- XThe precedence of the
- X.B {}
- X(numeric range) operator is different.
- X.I lex
- Xinterprets "abc{1,3}" as "match one, two, or
- Xthree occurrences of 'abc'", whereas
- X.I flex
- Xinterprets it as "match 'ab'
- Xfollowed by one, two, or three occurrences of 'c'". The latter is
- Xin agreement with the current POSIX draft.
- X.IP -
- XThe precedence of the
- X.B ^
- Xoperator is different.
- X.I lex
- Xinterprets "^foo|bar" as "match either 'foo' at the beginning of a line,
- Xor 'bar' anywhere", whereas
- X.I flex
- Xinterprets it as "match either 'foo' or 'bar' if they come at the beginning
- Xof a line". The latter is in agreement with the current POSIX draft.
- X.IP -
- XTo refer to yytext outside of the scanner source file,
- Xthe correct definition with
- X.I flex
- Xis "extern char *yytext" rather than "extern char yytext[]".
- XThis is contrary to the current POSIX draft but a point on which
- X.I flex
- Xwill not be changing, as the array representation entails a
- Xserious performance penalty. It is hoped that the POSIX draft will
- Xbe emended to support the
- X.I flex
- Xvariety of declaration (as this is a fairly painless change to
- Xrequire of
- X.I lex
- Xusers).
- X.IP -
- X.I yyin
- Xis
- X.I initialized
- Xby
- X.I lex
- Xto be
- X.I stdin;
- X.I flex,
- Xon the other hand,
- Xinitializes
- X.I yyin
- Xto NULL
- Xand then
- X.I assigns
- Xit to
- X.I stdin
- Xthe first time the scanner is called, providing
- X.I yyin
- Xhas not already been assigned to a non-NULL value. The difference is
- Xsubtle, but the net effect is that with
- X.I flex
- Xscanners,
- X.I yyin
- Xdoes not have a valid value until the scanner has been called.
- X.IP -
- XThe special table-size declarations such as
- X.B %a
- Xsupported by
- X.I lex
- Xare not required by
- X.I flex
- Xscanners;
- X.I flex
- Xignores them.
- X.IP -
- XThe name
- X.bd
- XFLEX_SCANNER
- Xis #define'd so scanners may be written for use with either
- X.I flex
- Xor
- X.I lex.
- X.LP
- XThe following
- X.I flex
- Xfeatures are not included in
- X.I lex
- Xor the POSIX draft standard:
- X.nf
- X
- X yyterminate()
- X <<EOF>>
- X YY_DECL
- X #line directives
- X %{}'s around actions
- X yyrestart()
- X comments beginning with '#' (deprecated)
- X multiple actions on a line
- X
- X.fi
- XThis last feature refers to the fact that with
- X.I flex
- Xyou can put multiple actions on the same line, separated with
- Xsemi-colons, while with
- X.I lex,
- Xthe following
- X.nf
- X
- X foo handle_foo(); ++num_foos_seen;
- X
- X.fi
- Xis (rather surprisingly) truncated to
- X.nf
- X
- X foo handle_foo();
- X
- X.fi
- X.I flex
- Xdoes not truncate the action. Actions that are not enclosed in
- Xbraces are simply terminated at the end of the line.
- X.SH DIAGNOSTICS
- X.I reject_used_but_not_detected undefined
- Xor
- X.I yymore_used_but_not_detected undefined -
- XThese errors can occur at compile time. They indicate that the
- Xscanner uses
- X.B REJECT
- Xor
- X.B yymore()
- Xbut that
- X.I flex
- Xfailed to notice the fact, meaning that
- X.I flex
- Xscanned the first two sections looking for occurrences of these actions
- Xand failed to find any, but somehow you snuck some in (via a #include
- Xfile, for example). Make an explicit reference to the action in your
- X.I flex
- Xinput file. (Note that previously
- X.I flex
- Xsupported a
- X.B %used/%unused
- Xmechanism for dealing with this problem; this feature is still supported
- Xbut now deprecated, and will go away soon unless the author hears from
- Xpeople who can argue compellingly that they need it.)
- X.LP
- X.I flex scanner jammed -
- Xa scanner compiled with
- X.B -s
- Xhas encountered an input string which wasn't matched by
- Xany of its rules.
- X.LP
- X.I flex input buffer overflowed -
- Xa scanner rule matched a string long enough to overflow the
- Xscanner's internal input buffer (16K bytes by default - controlled by
- X.B YY_BUF_SIZE
- Xin "flex.skel". Note that to redefine this macro, you must first
- X.B #undefine
- Xit).
- X.LP
- X.I scanner requires -8 flag -
- XYour scanner specification includes recognizing 8-bit characters and
- Xyou did not specify the -8 flag (and your site has not installed flex
- Xwith -8 as the default).
- X.LP
- X.I
- Xfatal flex scanner internal error--end of buffer missed -
- XThis can occur in an scanner which is reentered after a long-jump
- Xhas jumped out (or over) the scanner's activation frame. Before
- Xreentering the scanner, use:
- X.nf
- X
- X yyrestart( yyin );
- X
- X.fi
- X.LP
- X.I too many %t classes! -
- XYou managed to put every single character into its own %t class.
- X.I flex
- Xrequires that at least one of the classes share characters.
- X.SH DEFICIENCIES / BUGS
- XSee flex(1).
- X.SH "SEE ALSO"
- X.LP
- Xflex(1), lex(1), yacc(1), sed(1), awk(1).
- X.LP
- XM. E. Lesk and E. Schmidt,
- X.I LEX - Lexical Analyzer Generator
- X.SH AUTHOR
- XVern Paxson, with the help of many ideas and much inspiration from
- XVan Jacobson. Original version by Jef Poskanzer. The fast table
- Xrepresentation is a partial implementation of a design done by Van
- XJacobson. The implementation was done by Kevin Gong and Vern Paxson.
- X.LP
- XThanks to the many
- X.I flex
- Xbeta-testers, feedbackers, and contributors, especially Casey
- XLeedom, benson@odi.com, Keith Bostic,
- XFrederic Brehm, Nick Christopher, Jason Coughlin,
- XScott David Daniels, Leo Eskin,
- XChris Faylor, Eric Goldman, Eric
- XHughes, Jeffrey R. Jones, Kevin B. Kenny, Ronald Lamprecht,
- XGreg Lee, Craig Leres, Mohamed el Lozy, Jim Meyering, Marc Nozell, Esmond Pitt,
- XJef Poskanzer, Jim Roskind,
- XDave Tallman, Frank Whaley, Ken Yap, and those whose names
- Xhave slipped my marginal mail-archiving skills but whose contributions
- Xare appreciated all the same.
- X.LP
- XThanks to Keith Bostic, John Gilmore, Craig Leres, Bob
- XMulcahy, Rich Salz, and Richard Stallman for help with various distribution
- Xheadaches.
- X.LP
- XThanks to Esmond Pitt and Earle Horton for 8-bit character support;
- Xto Benson Margulies and Fred
- XBurke for C++ support; to Ove Ewerlid for the basics of support for
- XNUL's; and to Eric Hughes for the basics of support for multiple buffers.
- X.LP
- XWork is being done on extending
- X.I flex
- Xto generate scanners in which the
- Xstate machine is directly represented in C code rather than tables.
- XThese scanners may well be substantially faster than those generated
- Xusing -f or -F. If you are working in this area and are interested
- Xin comparing notes and seeing whether redundant work can be avoided,
- Xcontact Ove Ewerlid (ewerlid@mizar.DoCS.UU.SE).
- X.LP
- XThis work was primarily done when I was at the Real Time Systems Group
- Xat the Lawrence Berkeley Laboratory in Berkeley, CA. Many thanks to all there
- Xfor the support I received.
- X.LP
- XSend comments to:
- X.nf
- X
- X Vern Paxson
- X Computer Science Department
- X 4126 Upson Hall
- X Cornell University
- X Ithaca, NY 14853-7501
- X
- X vern@cs.cornell.edu
- X decvax!cornell!vern
- X
- X.fi
- END_OF_FILE
- if test 15514 -ne `wc -c <'flexdoc.1.02'`; then
- echo shar: \"'flexdoc.1.02'\" unpacked with wrong size!
- fi
- # end of 'flexdoc.1.02'
- fi
- if test -f 'misc.c' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'misc.c'\"
- else
- echo shar: Extracting \"'misc.c'\" \(14912 characters\)
- sed "s/^X//" >'misc.c' <<'END_OF_FILE'
- X/* misc - miscellaneous flex routines */
- X
- X/*-
- X * Copyright (c) 1990 The Regents of the University of California.
- X * All rights reserved.
- X *
- X * This code is derived from software contributed to Berkeley by
- X * Vern Paxson.
- X *
- X * The United States Government has rights in this work pursuant
- X * to contract no. DE-AC03-76SF00098 between the United States
- X * Department of Energy and the University of California.
- X *
- X * Redistribution and use in source and binary forms are permitted provided
- X * that: (1) source distributions retain this entire copyright notice and
- X * comment, and (2) distributions including binaries display the following
- X * acknowledgement: ``This product includes software developed by the
- X * University of California, Berkeley and its contributors'' in the
- X * documentation or other materials provided with the distribution and in
- X * all advertising materials mentioning features or use of this software.
- X * Neither the name of the University nor the names of its contributors may
- X * be used to endorse or promote products derived from this software without
- X * specific prior written permission.
- X * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
- X * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
- X * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
- X */
- X
- X#ifndef lint
- Xstatic char rcsid[] =
- X "@(#) $Header: /usr/fsys/odin/a/vern/flex/RCS/misc.c,v 2.9 90/08/14 00:10:24 vern Exp $ (LBL)";
- X#endif
- X
- X#include <ctype.h>
- X#include "flexdef.h"
- X
- X
- X/* ANSI C does not guarantee that isascii() is defined */
- X#ifndef isascii
- X#define isascii(c) ((c) <= 0177)
- X#endif
- X
- X
- X
- X/* declare functions that have forward references */
- X
- Xvoid dataflush PROTO(());
- Xint otoi PROTO((Char []));
- X
- X
- X/* action_out - write the actions from the temporary file to lex.yy.c
- X *
- X * synopsis
- X * action_out();
- X *
- X * Copies the action file up to %% (or end-of-file) to lex.yy.c
- X */
- X
- Xvoid action_out()
- X
- X {
- X char buf[MAXLINE];
- X
- X while ( fgets( buf, MAXLINE, temp_action_file ) != NULL )
- X if ( buf[0] == '%' && buf[1] == '%' )
- X break;
- X else
- X fputs( buf, stdout );
- X }
- X
- X
- X/* allocate_array - allocate memory for an integer array of the given size */
- X
- Xvoid *allocate_array( size, element_size )
- Xint size, element_size;
- X
- X {
- X register void *mem;
- X
- X /* on 16-bit int machines (e.g., 80286) we might be trying to
- X * allocate more than a signed int can hold, and that won't
- X * work. Cheap test:
- X */
- X if ( element_size * size <= 0 )
- X flexfatal( "request for < 1 byte in allocate_array()" );
- X
- X mem = (void *) malloc( (unsigned) (element_size * size) );
- X
- X if ( mem == NULL )
- X flexfatal( "memory allocation failed in allocate_array()" );
- X
- X return ( mem );
- X }
- X
- X
- X/* all_lower - true if a string is all lower-case
- X *
- X * synopsis:
- X * Char *str;
- X * int all_lower();
- X * true/false = all_lower( str );
- X */
- X
- Xint all_lower( str )
- Xregister Char *str;
- X
- X {
- X while ( *str )
- X {
- X if ( ! isascii( *str ) || ! islower( *str ) )
- X return ( 0 );
- X ++str;
- X }
- X
- X return ( 1 );
- X }
- X
- X
- X/* all_upper - true if a string is all upper-case
- X *
- X * synopsis:
- X * Char *str;
- X * int all_upper();
- X * true/false = all_upper( str );
- X */
- X
- Xint all_upper( str )
- Xregister Char *str;
- X
- X {
- X while ( *str )
- X {
- X if ( ! isascii( *str ) || ! isupper( (char) *str ) )
- X return ( 0 );
- X ++str;
- X }
- X
- X return ( 1 );
- X }
- X
- X
- X/* bubble - bubble sort an integer array in increasing order
- X *
- X * synopsis
- X * int v[n], n;
- X * bubble( v, n );
- X *
- X * description
- X * sorts the first n elements of array v and replaces them in
- X * increasing order.
- X *
- X * passed
- X * v - the array to be sorted
- X * n - the number of elements of 'v' to be sorted */
- X
- Xvoid bubble( v, n )
- Xint v[], n;
- X
- X {
- X register int i, j, k;
- X
- X for ( i = n; i > 1; --i )
- X for ( j = 1; j < i; ++j )
- X if ( v[j] > v[j + 1] ) /* compare */
- X {
- X k = v[j]; /* exchange */
- X v[j] = v[j + 1];
- X v[j + 1] = k;
- X }
- X }
- X
- X
- X/* clower - replace upper-case letter to lower-case
- X *
- X * synopsis:
- X * Char clower();
- X * int c;
- X * c = clower( c );
- X */
- X
- XChar clower( c )
- Xregister int c;
- X
- X {
- X return ( (isascii( c ) && isupper( c )) ? tolower( c ) : c );
- X }
- X
- X
- X/* copy_string - returns a dynamically allocated copy of a string
- X *
- X * synopsis
- X * char *str, *copy, *copy_string();
- X * copy = copy_string( str );
- X */
- X
- Xchar *copy_string( str )
- Xregister char *str;
- X
- X {
- X register char *c;
- X char *copy;
- X
- X /* find length */
- X for ( c = str; *c; ++c )
- X ;
- X
- X copy = malloc( (unsigned) ((c - str + 1) * sizeof( char )) );
- X
- X if ( copy == NULL )
- X flexfatal( "dynamic memory failure in copy_string()" );
- X
- X for ( c = copy; (*c++ = *str++); )
- X ;
- X
- X return ( copy );
- X }
- X
- X
- X/* copy_unsigned_string -
- X * returns a dynamically allocated copy of a (potentially) unsigned string
- X *
- X * synopsis
- X * Char *str, *copy, *copy_unsigned_string();
- X * copy = copy_unsigned_string( str );
- X */
- X
- XChar *copy_unsigned_string( str )
- Xregister Char *str;
- X
- X {
- X register Char *c;
- X Char *copy;
- X
- X /* find length */
- X for ( c = str; *c; ++c )
- X ;
- X
- X copy = (Char *) malloc( (unsigned) ((c - str + 1) * sizeof( Char )) );
- X
- X if ( copy == NULL )
- X flexfatal( "dynamic memory failure in copy_unsigned_string()" );
- X
- X for ( c = copy; (*c++ = *str++); )
- X ;
- X
- X return ( copy );
- X }
- X
- X
- X/* cshell - shell sort a character array in increasing order
- X *
- X * synopsis
- X *
- X * Char v[n];
- X * int n, special_case_0;
- X * cshell( v, n, special_case_0 );
- X *
- X * description
- X * does a shell sort of the first n elements of array v.
- X * If special_case_0 is true, then any element equal to 0
- X * is instead assumed to have infinite weight.
- X *
- X * passed
- X * v - array to be sorted
- X * n - number of elements of v to be sorted
- X */
- X
- Xvoid cshell( v, n, special_case_0 )
- XChar v[];
- Xint n, special_case_0;
- X
- X {
- X int gap, i, j, jg;
- X Char k;
- X
- X for ( gap = n / 2; gap > 0; gap = gap / 2 )
- X for ( i = gap; i < n; ++i )
- X for ( j = i - gap; j >= 0; j = j - gap )
- X {
- X jg = j + gap;
- X
- X if ( special_case_0 )
- X {
- X if ( v[jg] == 0 )
- X break;
- X
- X else if ( v[j] != 0 && v[j] <= v[jg] )
- X break;
- X }
- X
- X else if ( v[j] <= v[jg] )
- X break;
- X
- X k = v[j];
- X v[j] = v[jg];
- X v[jg] = k;
- X }
- X }
- X
- X
- X/* dataend - finish up a block of data declarations
- X *
- X * synopsis
- X * dataend();
- X */
- X
- Xvoid dataend()
- X
- X {
- X if ( datapos > 0 )
- X dataflush();
- X
- X /* add terminator for initialization */
- X puts( " } ;\n" );
- X
- X dataline = 0;
- X datapos = 0;
- X }
- X
- X
- X
- X/* dataflush - flush generated data statements
- X *
- X * synopsis
- X * dataflush();
- X */
- X
- Xvoid dataflush()
- X
- X {
- X putchar( '\n' );
- X
- X if ( ++dataline >= NUMDATALINES )
- X {
- X /* put out a blank line so that the table is grouped into
- X * large blocks that enable the user to find elements easily
- X */
- X putchar( '\n' );
- X dataline = 0;
- X }
- X
- X /* reset the number of characters written on the current line */
- X datapos = 0;
- X }
- X
- X
- X/* flexerror - report an error message and terminate
- X *
- X * synopsis
- X * char msg[];
- X * flexerror( msg );
- X */
- X
- Xvoid flexerror( msg )
- Xchar msg[];
- X
- X {
- X fprintf( stderr, "%s: %s\n", program_name, msg );
- X
- X flexend( 1 );
- X }
- X
- X
- X/* flexfatal - report a fatal error message and terminate
- X *
- X * synopsis
- X * char msg[];
- X * flexfatal( msg );
- X */
- X
- Xvoid flexfatal( msg )
- Xchar msg[];
- X
- X {
- X fprintf( stderr, "%s: fatal internal error, %s\n", program_name, msg );
- X flexend( 1 );
- X }
- X
- X
- X/* flex_gettime - return current time
- X *
- X * synopsis
- X * char *flex_gettime(), *time_str;
- X * time_str = flex_gettime();
- X *
- X * note
- X * the routine name has the "flex_" prefix because of name clashes
- X * with Turbo-C
- X */
- X
- X/* include sys/types.h to use time_t and make lint happy */
- X
- X#ifndef MS_DOS
- X#ifndef VMS
- X#include <sys/types.h>
- X#else
- X#include <types.h>
- X#endif
- X#endif
- X
- X#ifdef MS_DOS
- X#include <time.h>
- Xtypedef long time_t;
- X#endif
- X
- Xchar *flex_gettime()
- X
- X {
- X time_t t, time();
- X char *result, *ctime(), *copy_string();
- X
- X t = time( (long *) 0 );
- X
- X result = copy_string( ctime( &t ) );
- X
- X /* get rid of trailing newline */
- X result[24] = '\0';
- X
- X return ( result );
- X }
- X
- X
- X/* lerrif - report an error message formatted with one integer argument
- X *
- X * synopsis
- X * char msg[];
- X * int arg;
- X * lerrif( msg, arg );
- X */
- X
- Xvoid lerrif( msg, arg )
- Xchar msg[];
- Xint arg;
- X
- X {
- X char errmsg[MAXLINE];
- X (void) sprintf( errmsg, msg, arg );
- X flexerror( errmsg );
- X }
- X
- X
- X/* lerrsf - report an error message formatted with one string argument
- X *
- X * synopsis
- X * char msg[], arg[];
- X * lerrsf( msg, arg );
- X */
- X
- Xvoid lerrsf( msg, arg )
- Xchar msg[], arg[];
- X
- X {
- X char errmsg[MAXLINE];
- X
- X (void) sprintf( errmsg, msg, arg );
- X flexerror( errmsg );
- X }
- X
- X
- X/* htoi - convert a hexadecimal digit string to an integer value
- X *
- X * synopsis:
- X * int val, htoi();
- X * Char str[];
- X * val = htoi( str );
- X */
- X
- Xint htoi( str )
- XChar str[];
- X
- X {
- X int result;
- X
- X (void) sscanf( (char *) str, "%x", &result );
- X
- X return ( result );
- X }
- X
- X
- X/* is_hex_digit - returns true if a character is a valid hex digit, false
- X * otherwise
- X *
- X * synopsis:
- X * int true_or_false, is_hex_digit();
- X * int ch;
- X * val = is_hex_digit( ch );
- X */
- X
- Xint is_hex_digit( ch )
- Xint ch;
- X
- X {
- X if ( isdigit( ch ) )
- X return ( 1 );
- X
- X switch ( clower( ch ) )
- X {
- X case 'a':
- X case 'b':
- X case 'c':
- X case 'd':
- X case 'e':
- X case 'f':
- X return ( 1 );
- X
- X default:
- X return ( 0 );
- X }
- X }
- X
- X
- X/* line_directive_out - spit out a "# line" statement */
- X
- Xvoid line_directive_out( output_file_name )
- XFILE *output_file_name;
- X
- X {
- X if ( infilename && gen_line_dirs )
- X fprintf( output_file_name, "# line %d \"%s\"\n", linenum, infilename );
- X }
- X
- X
- X/* mk2data - generate a data statement for a two-dimensional array
- X *
- X * synopsis
- X * int value;
- X * mk2data( value );
- X *
- X * generates a data statement initializing the current 2-D array to "value"
- X */
- Xvoid mk2data( value )
- Xint value;
- X
- X {
- X if ( datapos >= NUMDATAITEMS )
- X {
- X putchar( ',' );
- X dataflush();
- X }
- X
- X if ( datapos == 0 )
- X /* indent */
- X fputs( " ", stdout );
- X
- X else
- X putchar( ',' );
- X
- X ++datapos;
- X
- X printf( "%5d", value );
- X }
- X
- X
- X/* mkdata - generate a data statement
- X *
- X * synopsis
- X * int value;
- X * mkdata( value );
- X *
- X * generates a data statement initializing the current array element to
- X * "value"
- X */
- Xvoid mkdata( value )
- Xint value;
- X
- X {
- X if ( datapos >= NUMDATAITEMS )
- X {
- X putchar( ',' );
- X dataflush();
- X }
- X
- X if ( datapos == 0 )
- X /* indent */
- X fputs( " ", stdout );
- X
- X else
- X putchar( ',' );
- X
- X ++datapos;
- X
- X printf( "%5d", value );
- X }
- X
- X
- X/* myctoi - return the integer represented by a string of digits
- X *
- X * synopsis
- X * Char array[];
- X * int val, myctoi();
- X * val = myctoi( array );
- X *
- X */
- X
- Xint myctoi( array )
- XChar array[];
- X
- X {
- X int val = 0;
- X
- X (void) sscanf( (char *) array, "%d", &val );
- X
- X return ( val );
- X }
- X
- X
- X/* myesc - return character corresponding to escape sequence
- X *
- X * synopsis
- X * Char array[], c, myesc();
- X * c = myesc( array );
- X *
- X */
- X
- XChar myesc( array )
- XChar array[];
- X
- X {
- X Char c, esc_char;
- X register int sptr;
- X
- X switch ( array[1] )
- X {
- X case 'a': return ( '\a' );
- X case 'b': return ( '\b' );
- X case 'f': return ( '\f' );
- X case 'n': return ( '\n' );
- X case 'r': return ( '\r' );
- X case 't': return ( '\t' );
- X case 'v': return ( '\v' );
- X
- X case '0':
- X case '1':
- X case '2':
- X case '3':
- X case '4':
- X case '5':
- X case '6':
- X case '7':
- X case '8':
- X case '9':
- X { /* \<octal> */
- X sptr = 1;
- X
- X while ( isascii( array[sptr] ) && isdigit( array[sptr] ) )
- X /* don't increment inside loop control because if
- X * isdigit() is a macro it might expand into multiple
- X * increments ...
- X */
- X ++sptr;
- X
- X c = array[sptr];
- X array[sptr] = '\0';
- X
- X esc_char = otoi( array + 1 );
- X
- X array[sptr] = c;
- X
- X return ( esc_char );
- X }
- X
- X case 'x':
- X { /* \x<hex> */
- X int sptr = 2;
- X
- X while ( isascii( array[sptr] ) && is_hex_digit( array[sptr] ) )
- X /* don't increment inside loop control because if
- X * isdigit() is a macro it might expand into multiple
- X * increments ...
- X */
- X ++sptr;
- X
- X c = array[sptr];
- X array[sptr] = '\0';
- X
- X esc_char = htoi( array + 2 );
- X
- X array[sptr] = c;
- X
- X return ( esc_char );
- X }
- X
- X default:
- X return ( array[1] );
- X }
- X }
- X
- X
- X/* otoi - convert an octal digit string to an integer value
- X *
- X * synopsis:
- X * int val, otoi();
- X * Char str[];
- X * val = otoi( str );
- X */
- X
- Xint otoi( str )
- XChar str[];
- X
- X {
- X int result;
- X
- X (void) sscanf( (char *) str, "%o", &result );
- X
- X return ( result );
- X }
- X
- X
- X/* readable_form - return the the human-readable form of a character
- X *
- X * synopsis:
- X * int c;
- X * char *readable_form();
- X * <string> = readable_form( c );
- X *
- X * The returned string is in static storage.
- X */
- X
- Xchar *readable_form( c )
- Xregister int c;
- X
- X {
- X static char rform[10];
- X
- X if ( (c >= 0 && c < 32) || c >= 127 )
- X {
- X switch ( c )
- X {
- X case '\n': return ( "\\n" );
- X case '\t': return ( "\\t" );
- X case '\f': return ( "\\f" );
- X case '\r': return ( "\\r" );
- X case '\b': return ( "\\b" );
- X
- X default:
- X (void) sprintf( rform, "\\%.3o", c );
- X return ( rform );
- X }
- X }
- X
- X else if ( c == ' ' )
- X return ( "' '" );
- X
- X else
- X {
- X rform[0] = c;
- X rform[1] = '\0';
- X
- X return ( rform );
- X }
- X }
- X
- X
- X/* reallocate_array - increase the size of a dynamic array */
- X
- Xvoid *reallocate_array( array, size, element_size )
- Xvoid *array;
- Xint size, element_size;
- X
- X {
- X register void *new_array;
- X
- X /* same worry as in allocate_array(): */
- X if ( size * element_size <= 0 )
- X flexfatal( "attempt to increase array size by less than 1 byte" );
- X
- X new_array =
- X (void *) realloc( (char *)array, (unsigned) (size * element_size ));
- X
- X if ( new_array == NULL )
- X flexfatal( "attempt to increase array size failed" );
- X
- X return ( new_array );
- X }
- X
- X
- X/* skelout - write out one section of the skeleton file
- X *
- X * synopsis
- X * skelout();
- X *
- X * DESCRIPTION
- X * Copies from skelfile to stdout until a line beginning with "%%" or
- X * EOF is found.
- X */
- Xvoid skelout()
- X
- X {
- X char buf[MAXLINE];
- X
- X while ( fgets( buf, MAXLINE, skelfile ) != NULL )
- X if ( buf[0] == '%' && buf[1] == '%' )
- X break;
- X else
- X fputs( buf, stdout );
- X }
- X
- X
- X/* transition_struct_out - output a yy_trans_info structure
- X *
- X * synopsis
- X * int element_v, element_n;
- X * transition_struct_out( element_v, element_n );
- X *
- X * outputs the yy_trans_info structure with the two elements, element_v and
- X * element_n. Formats the output with spaces and carriage returns.
- X */
- X
- Xvoid transition_struct_out( element_v, element_n )
- Xint element_v, element_n;
- X
- X {
- X printf( "%7d, %5d,", element_v, element_n );
- X
- X datapos += TRANS_STRUCT_PRINT_LENGTH;
- X
- X if ( datapos >= 75 )
- X {
- X putchar( '\n' );
- X
- X if ( ++dataline % 10 == 0 )
- X putchar( '\n' );
- X
- X datapos = 0;
- X }
- X }
- END_OF_FILE
- if test 14912 -ne `wc -c <'misc.c'`; then
- echo shar: \"'misc.c'\" unpacked with wrong size!
- fi
- # end of 'misc.c'
- fi
- if test -f 'nfa.c' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'nfa.c'\"
- else
- echo shar: Extracting \"'nfa.c'\" \(17603 characters\)
- sed "s/^X//" >'nfa.c' <<'END_OF_FILE'
- X/* nfa - NFA construction routines */
- X
- X/*-
- X * Copyright (c) 1990 The Regents of the University of California.
- X * All rights reserved.
- X *
- X * This code is derived from software contributed to Berkeley by
- X * Vern Paxson.
- X *
- X * The United States Government has rights in this work pursuant
- X * to contract no. DE-AC03-76SF00098 between the United States
- X * Department of Energy and the University of California.
- X *
- X * Redistribution and use in source and binary forms are permitted provided
- X * that: (1) source distributions retain this entire copyright notice and
- X * comment, and (2) distributions including binaries display the following
- X * acknowledgement: ``This product includes software developed by the
- X * University of California, Berkeley and its contributors'' in the
- X * documentation or other materials provided with the distribution and in
- X * all advertising materials mentioning features or use of this software.
- X * Neither the name of the University nor the names of its contributors may
- X * be used to endorse or promote products derived from this software without
- X * specific prior written permission.
- X * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
- X * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
- X * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
- X */
- X
- X#ifndef lint
- Xstatic char rcsid[] =
- X "@(#) $Header: /usr/fsys/odin/a/vern/flex/RCS/nfa.c,v 2.6 90/06/27 23:48:29 vern Exp $ (LBL)";
- X#endif
- X
- X#include "flexdef.h"
- X
- X
- X/* declare functions that have forward references */
- X
- Xint dupmachine PROTO((int));
- Xvoid mkxtion PROTO((int, int));
- X
- X
- X/* add_accept - add an accepting state to a machine
- X *
- X * synopsis
- X *
- X * add_accept( mach, accepting_number );
- X *
- X * accepting_number becomes mach's accepting number.
- X */
- X
- Xvoid add_accept( mach, accepting_number )
- Xint mach, accepting_number;
- X
- X {
- X /* hang the accepting number off an epsilon state. if it is associated
- X * with a state that has a non-epsilon out-transition, then the state
- X * will accept BEFORE it makes that transition, i.e., one character
- X * too soon
- X */
- X
- X if ( transchar[finalst[mach]] == SYM_EPSILON )
- X accptnum[finalst[mach]] = accepting_number;
- X
- X else
- X {
- X int astate = mkstate( SYM_EPSILON );
- X accptnum[astate] = accepting_number;
- X mach = link_machines( mach, astate );
- X }
- X }
- X
- X
- X/* copysingl - make a given number of copies of a singleton machine
- X *
- X * synopsis
- X *
- X * newsng = copysingl( singl, num );
- X *
- X * newsng - a new singleton composed of num copies of singl
- X * singl - a singleton machine
- X * num - the number of copies of singl to be present in newsng
- X */
- X
- Xint copysingl( singl, num )
- Xint singl, num;
- X
- X {
- X int copy, i;
- X
- X copy = mkstate( SYM_EPSILON );
- X
- X for ( i = 1; i <= num; ++i )
- X copy = link_machines( copy, dupmachine( singl ) );
- X
- X return ( copy );
- X }
- X
- X
- X/* dumpnfa - debugging routine to write out an nfa
- X *
- X * synopsis
- X * int state1;
- X * dumpnfa( state1 );
- X */
- X
- Xvoid dumpnfa( state1 )
- Xint state1;
- X
- X {
- X int sym, tsp1, tsp2, anum, ns;
- X
- X fprintf( stderr, "\n\n********** beginning dump of nfa with start state %d\n",
- X state1 );
- X
- X /* we probably should loop starting at firstst[state1] and going to
- X * lastst[state1], but they're not maintained properly when we "or"
- X * all of the rules together. So we use our knowledge that the machine
- X * starts at state 1 and ends at lastnfa.
- X */
- X
- X /* for ( ns = firstst[state1]; ns <= lastst[state1]; ++ns ) */
- X for ( ns = 1; ns <= lastnfa; ++ns )
- X {
- X fprintf( stderr, "state # %4d\t", ns );
- X
- X sym = transchar[ns];
- X tsp1 = trans1[ns];
- X tsp2 = trans2[ns];
- X anum = accptnum[ns];
- X
- X fprintf( stderr, "%3d: %4d, %4d", sym, tsp1, tsp2 );
- X
- X if ( anum != NIL )
- X fprintf( stderr, " [%d]", anum );
- X
- X fprintf( stderr, "\n" );
- X }
- X
- X fprintf( stderr, "********** end of dump\n" );
- X }
- X
- X
- X/* dupmachine - make a duplicate of a given machine
- X *
- X * synopsis
- X *
- X * copy = dupmachine( mach );
- X *
- X * copy - holds duplicate of mach
- X * mach - machine to be duplicated
- X *
- X * note that the copy of mach is NOT an exact duplicate; rather, all the
- X * transition states values are adjusted so that the copy is self-contained,
- X * as the original should have been.
- X *
- X * also note that the original MUST be contiguous, with its low and high
- X * states accessible by the arrays firstst and lastst
- X */
- X
- Xint dupmachine( mach )
- Xint mach;
- X
- X {
- X int i, init, state_offset;
- X int state = 0;
- X int last = lastst[mach];
- X
- X for ( i = firstst[mach]; i <= last; ++i )
- X {
- X state = mkstate( transchar[i] );
- X
- X if ( trans1[i] != NO_TRANSITION )
- X {
- X mkxtion( finalst[state], trans1[i] + state - i );
- X
- X if ( transchar[i] == SYM_EPSILON && trans2[i] != NO_TRANSITION )
- X mkxtion( finalst[state], trans2[i] + state - i );
- X }
- X
- X accptnum[state] = accptnum[i];
- X }
- X
- X if ( state == 0 )
- X flexfatal( "empty machine in dupmachine()" );
- X
- X state_offset = state - i + 1;
- X
- X init = mach + state_offset;
- X firstst[init] = firstst[mach] + state_offset;
- X finalst[init] = finalst[mach] + state_offset;
- X lastst[init] = lastst[mach] + state_offset;
- X
- X return ( init );
- X }
- X
- X
- X/* finish_rule - finish up the processing for a rule
- X *
- X * synopsis
- X *
- X * finish_rule( mach, variable_trail_rule, headcnt, trailcnt );
- X *
- X * An accepting number is added to the given machine. If variable_trail_rule
- X * is true then the rule has trailing context and both the head and trail
- X * are variable size. Otherwise if headcnt or trailcnt is non-zero then
- X * the machine recognizes a pattern with trailing context and headcnt is
- X * the number of characters in the matched part of the pattern, or zero
- X * if the matched part has variable length. trailcnt is the number of
- X * trailing context characters in the pattern, or zero if the trailing
- X * context has variable length.
- X */
- X
- Xvoid finish_rule( mach, variable_trail_rule, headcnt, trailcnt )
- Xint mach, variable_trail_rule, headcnt, trailcnt;
- X
- X {
- X add_accept( mach, num_rules );
- X
- X /* we did this in new_rule(), but it often gets the wrong
- X * number because we do it before we start parsing the current rule
- X */
- X rule_linenum[num_rules] = linenum;
- X
- X /* if this is a continued action, then the line-number has
- X * already been updated, giving us the wrong number
- X */
- X if ( continued_action )
- X --rule_linenum[num_rules];
- X
- X fprintf( temp_action_file, "case %d:\n", num_rules );
- X
- X if ( variable_trail_rule )
- X {
- X rule_type[num_rules] = RULE_VARIABLE;
- X
- X if ( performance_report )
- X fprintf( stderr, "Variable trailing context rule at line %d\n",
- X rule_linenum[num_rules] );
- X
- X variable_trailing_context_rules = true;
- X }
- X
- X else
- X {
- X rule_type[num_rules] = RULE_NORMAL;
- X
- X if ( headcnt > 0 || trailcnt > 0 )
- X {
- X /* do trailing context magic to not match the trailing characters */
- X char *scanner_cp = "yy_c_buf_p = yy_cp";
- X char *scanner_bp = "yy_bp";
- X
- X fprintf( temp_action_file,
- X "*yy_cp = yy_hold_char; /* undo effects of setting up yytext */\n" );
- X
- X if ( headcnt > 0 )
- X fprintf( temp_action_file, "%s = %s + %d;\n",
- X scanner_cp, scanner_bp, headcnt );
- X
- X else
- X fprintf( temp_action_file,
- X "%s -= %d;\n", scanner_cp, trailcnt );
- X
- X fprintf( temp_action_file,
- X "YY_DO_BEFORE_ACTION; /* set up yytext again */\n" );
- X }
- X }
- X
- X line_directive_out( temp_action_file );
- X }
- X
- X
- X/* link_machines - connect two machines together
- X *
- X * synopsis
- X *
- X * new = link_machines( first, last );
- X *
- X * new - a machine constructed by connecting first to last
- X * first - the machine whose successor is to be last
- X * last - the machine whose predecessor is to be first
- X *
- X * note: this routine concatenates the machine first with the machine
- X * last to produce a machine new which will pattern-match first first
- X * and then last, and will fail if either of the sub-patterns fails.
- X * FIRST is set to new by the operation. last is unmolested.
- X */
- X
- Xint link_machines( first, last )
- Xint first, last;
- X
- X {
- X if ( first == NIL )
- X return ( last );
- X
- X else if ( last == NIL )
- X return ( first );
- X
- X else
- X {
- X mkxtion( finalst[first], last );
- X finalst[first] = finalst[last];
- X lastst[first] = max( lastst[first], lastst[last] );
- X firstst[first] = min( firstst[first], firstst[last] );
- X
- X return ( first );
- X }
- X }
- X
- X
- X/* mark_beginning_as_normal - mark each "beginning" state in a machine
- X * as being a "normal" (i.e., not trailing context-
- X * associated) states
- X *
- X * synopsis
- X *
- X * mark_beginning_as_normal( mach )
- X *
- X * mach - machine to mark
- X *
- X * The "beginning" states are the epsilon closure of the first state
- X */
- X
- Xvoid mark_beginning_as_normal( mach )
- Xregister int mach;
- X
- X {
- X switch ( state_type[mach] )
- X {
- X case STATE_NORMAL:
- X /* oh, we've already visited here */
- X return;
- X
- X case STATE_TRAILING_CONTEXT:
- X state_type[mach] = STATE_NORMAL;
- X
- X if ( transchar[mach] == SYM_EPSILON )
- X {
- X if ( trans1[mach] != NO_TRANSITION )
- X mark_beginning_as_normal( trans1[mach] );
- X
- X if ( trans2[mach] != NO_TRANSITION )
- X mark_beginning_as_normal( trans2[mach] );
- X }
- X break;
- X
- X default:
- X flexerror( "bad state type in mark_beginning_as_normal()" );
- X break;
- X }
- X }
- X
- X
- X/* mkbranch - make a machine that branches to two machines
- X *
- X * synopsis
- X *
- X * branch = mkbranch( first, second );
- X *
- X * branch - a machine which matches either first's pattern or second's
- X * first, second - machines whose patterns are to be or'ed (the | operator)
- X *
- X * note that first and second are NEITHER destroyed by the operation. Also,
- X * the resulting machine CANNOT be used with any other "mk" operation except
- X * more mkbranch's. Compare with mkor()
- X */
- X
- Xint mkbranch( first, second )
- Xint first, second;
- X
- X {
- X int eps;
- X
- X if ( first == NO_TRANSITION )
- X return ( second );
- X
- X else if ( second == NO_TRANSITION )
- X return ( first );
- X
- X eps = mkstate( SYM_EPSILON );
- X
- X mkxtion( eps, first );
- X mkxtion( eps, second );
- X
- X return ( eps );
- X }
- X
- X
- X/* mkclos - convert a machine into a closure
- X *
- X * synopsis
- X * new = mkclos( state );
- X *
- X * new - a new state which matches the closure of "state"
- X */
- X
- Xint mkclos( state )
- Xint state;
- X
- X {
- X return ( mkopt( mkposcl( state ) ) );
- X }
- X
- X
- X/* mkopt - make a machine optional
- X *
- X * synopsis
- X *
- X * new = mkopt( mach );
- X *
- X * new - a machine which optionally matches whatever mach matched
- X * mach - the machine to make optional
- X *
- X * notes:
- X * 1. mach must be the last machine created
- X * 2. mach is destroyed by the call
- X */
- X
- Xint mkopt( mach )
- Xint mach;
- X
- X {
- X int eps;
- X
- X if ( ! SUPER_FREE_EPSILON(finalst[mach]) )
- X {
- X eps = mkstate( SYM_EPSILON );
- X mach = link_machines( mach, eps );
- X }
- X
- X /* can't skimp on the following if FREE_EPSILON(mach) is true because
- X * some state interior to "mach" might point back to the beginning
- X * for a closure
- X */
- X eps = mkstate( SYM_EPSILON );
- X mach = link_machines( eps, mach );
- X
- X mkxtion( mach, finalst[mach] );
- X
- X return ( mach );
- X }
- X
- X
- X/* mkor - make a machine that matches either one of two machines
- X *
- X * synopsis
- X *
- X * new = mkor( first, second );
- X *
- X * new - a machine which matches either first's pattern or second's
- X * first, second - machines whose patterns are to be or'ed (the | operator)
- X *
- X * note that first and second are both destroyed by the operation
- X * the code is rather convoluted because an attempt is made to minimize
- X * the number of epsilon states needed
- X */
- X
- Xint mkor( first, second )
- Xint first, second;
- X
- X {
- X int eps, orend;
- X
- X if ( first == NIL )
- X return ( second );
- X
- X else if ( second == NIL )
- X return ( first );
- X
- X else
- X {
- X /* see comment in mkopt() about why we can't use the first state
- X * of "first" or "second" if they satisfy "FREE_EPSILON"
- X */
- X eps = mkstate( SYM_EPSILON );
- X
- X first = link_machines( eps, first );
- X
- X mkxtion( first, second );
- X
- X if ( SUPER_FREE_EPSILON(finalst[first]) &&
- X accptnum[finalst[first]] == NIL )
- X {
- X orend = finalst[first];
- X mkxtion( finalst[second], orend );
- X }
- X
- X else if ( SUPER_FREE_EPSILON(finalst[second]) &&
- X accptnum[finalst[second]] == NIL )
- X {
- X orend = finalst[second];
- X mkxtion( finalst[first], orend );
- X }
- X
- X else
- X {
- X eps = mkstate( SYM_EPSILON );
- X
- X first = link_machines( first, eps );
- X orend = finalst[first];
- X
- X mkxtion( finalst[second], orend );
- X }
- X }
- X
- X finalst[first] = orend;
- X return ( first );
- X }
- X
- X
- X/* mkposcl - convert a machine into a positive closure
- X *
- X * synopsis
- X * new = mkposcl( state );
- X *
- X * new - a machine matching the positive closure of "state"
- X */
- X
- Xint mkposcl( state )
- Xint state;
- X
- X {
- X int eps;
- X
- X if ( SUPER_FREE_EPSILON(finalst[state]) )
- X {
- X mkxtion( finalst[state], state );
- X return ( state );
- X }
- X
- X else
- X {
- X eps = mkstate( SYM_EPSILON );
- X mkxtion( eps, state );
- X return ( link_machines( state, eps ) );
- X }
- X }
- X
- X
- X/* mkrep - make a replicated machine
- X *
- X * synopsis
- X * new = mkrep( mach, lb, ub );
- X *
- X * new - a machine that matches whatever "mach" matched from "lb"
- X * number of times to "ub" number of times
- X *
- X * note
- X * if "ub" is INFINITY then "new" matches "lb" or more occurrences of "mach"
- X */
- X
- Xint mkrep( mach, lb, ub )
- Xint mach, lb, ub;
- X
- X {
- X int base_mach, tail, copy, i;
- X
- X base_mach = copysingl( mach, lb - 1 );
- X
- X if ( ub == INFINITY )
- X {
- X copy = dupmachine( mach );
- X mach = link_machines( mach,
- X link_machines( base_mach, mkclos( copy ) ) );
- X }
- X
- X else
- X {
- X tail = mkstate( SYM_EPSILON );
- X
- X for ( i = lb; i < ub; ++i )
- X {
- X copy = dupmachine( mach );
- X tail = mkopt( link_machines( copy, tail ) );
- X }
- X
- X mach = link_machines( mach, link_machines( base_mach, tail ) );
- X }
- X
- X return ( mach );
- X }
- X
- X
- X/* mkstate - create a state with a transition on a given symbol
- X *
- X * synopsis
- X *
- X * state = mkstate( sym );
- X *
- X * state - a new state matching sym
- X * sym - the symbol the new state is to have an out-transition on
- X *
- X * note that this routine makes new states in ascending order through the
- X * state array (and increments LASTNFA accordingly). The routine DUPMACHINE
- X * relies on machines being made in ascending order and that they are
- X * CONTIGUOUS. Change it and you will have to rewrite DUPMACHINE (kludge
- X * that it admittedly is)
- X */
- X
- Xint mkstate( sym )
- Xint sym;
- X
- X {
- X if ( ++lastnfa >= current_mns )
- X {
- X if ( (current_mns += MNS_INCREMENT) >= MAXIMUM_MNS )
- X lerrif( "input rules are too complicated (>= %d NFA states)",
- X current_mns );
- X
- X ++num_reallocs;
- X
- X firstst = reallocate_integer_array( firstst, current_mns );
- X lastst = reallocate_integer_array( lastst, current_mns );
- X finalst = reallocate_integer_array( finalst, current_mns );
- X transchar = reallocate_integer_array( transchar, current_mns );
- X trans1 = reallocate_integer_array( trans1, current_mns );
- X trans2 = reallocate_integer_array( trans2, current_mns );
- X accptnum = reallocate_integer_array( accptnum, current_mns );
- X assoc_rule = reallocate_integer_array( assoc_rule, current_mns );
- X state_type = reallocate_integer_array( state_type, current_mns );
- X }
- X
- X firstst[lastnfa] = lastnfa;
- X finalst[lastnfa] = lastnfa;
- X lastst[lastnfa] = lastnfa;
- X transchar[lastnfa] = sym;
- X trans1[lastnfa] = NO_TRANSITION;
- X trans2[lastnfa] = NO_TRANSITION;
- X accptnum[lastnfa] = NIL;
- X assoc_rule[lastnfa] = num_rules;
- X state_type[lastnfa] = current_state_type;
- X
- X /* fix up equivalence classes base on this transition. Note that any
- X * character which has its own transition gets its own equivalence class.
- X * Thus only characters which are only in character classes have a chance
- X * at being in the same equivalence class. E.g. "a|b" puts 'a' and 'b'
- X * into two different equivalence classes. "[ab]" puts them in the same
- X * equivalence class (barring other differences elsewhere in the input).
- X */
- X
- X if ( sym < 0 )
- X {
- X /* we don't have to update the equivalence classes since that was
- X * already done when the ccl was created for the first time
- X */
- X }
- X
- X else if ( sym == SYM_EPSILON )
- X ++numeps;
- X
- X else
- X {
- X if ( useecs )
- X /* map NUL's to csize */
- X mkechar( sym ? sym : csize, nextecm, ecgroup );
- X }
- X
- X return ( lastnfa );
- X }
- X
- X
- X/* mkxtion - make a transition from one state to another
- X *
- X * synopsis
- X *
- X * mkxtion( statefrom, stateto );
- X *
- X * statefrom - the state from which the transition is to be made
- X * stateto - the state to which the transition is to be made
- X */
- X
- Xvoid mkxtion( statefrom, stateto )
- Xint statefrom, stateto;
- X
- X {
- X if ( trans1[statefrom] == NO_TRANSITION )
- X trans1[statefrom] = stateto;
- X
- X else if ( (transchar[statefrom] != SYM_EPSILON) ||
- X (trans2[statefrom] != NO_TRANSITION) )
- X flexfatal( "found too many transitions in mkxtion()" );
- X
- X else
- X { /* second out-transition for an epsilon state */
- X ++eps2;
- X trans2[statefrom] = stateto;
- X }
- X }
- X
- X/* new_rule - initialize for a new rule
- X *
- X * synopsis
- X *
- X * new_rule();
- X *
- X * the global num_rules is incremented and the any corresponding dynamic
- X * arrays (such as rule_type[]) are grown as needed.
- X */
- X
- Xvoid new_rule()
- X
- X {
- X if ( ++num_rules >= current_max_rules )
- X {
- X ++num_reallocs;
- X current_max_rules += MAX_RULES_INCREMENT;
- X rule_type = reallocate_integer_array( rule_type, current_max_rules );
- X rule_linenum =
- X reallocate_integer_array( rule_linenum, current_max_rules );
- X }
- X
- X if ( num_rules > MAX_RULE )
- X lerrif( "too many rules (> %d)!", MAX_RULE );
- X
- X rule_linenum[num_rules] = linenum;
- X }
- END_OF_FILE
- if test 17603 -ne `wc -c <'nfa.c'`; then
- echo shar: \"'nfa.c'\" unpacked with wrong size!
- fi
- # end of 'nfa.c'
- fi
- echo shar: End of archive 8 \(of 10\).
- cp /dev/null ark8isdone
- MISSING=""
- for I in 1 2 3 4 5 6 7 8 9 10 ; do
- if test ! -f ark${I}isdone ; then
- MISSING="${MISSING} ${I}"
- fi
- done
- if test "${MISSING}" = "" ; then
- echo You have unpacked all 10 archives.
- rm -f ark[1-9]isdone ark[1-9][0-9]isdone
- else
- echo You still must unpack the following archives:
- echo " " ${MISSING}
- fi
- exit 0
- exit 0 # Just in case...
-